System and method for automated establishment of experience ratings and/or risk reserves

ABSTRACT

System and method for automated experience rating and/or loss reserving for events, a certain event P i,f  of an initial year i including development values P ikf  with development year k. For i, k applicable is i=1, . . . , K and k=1, . . . , K, K being the last known development year, and the first initial year i=1 comprising all development values P 1kf  in a specified way. To determine the development values P i,K−(i−j)+1,f  neural networks N i,j  are generated iteratively for each initial year i (i−1), whereby j=1, . . . ,(i−1) are the number of iterations for a particular initial year i and whereby the neural network N i,j+1  depends recursively on the neural network N i,j . In particular the system and method is suitable for experience rating for insurance contracts and/or excess of loss reinsurance contracts.

The invention relates to a system and a method for automated experiencerating and/or loss reserving, a certain event P_(if) of an initial timeinterval i with f=1, . . . ,F_(i) for a sequence of developmentintervals k=1, . . . ,K including development values P_(ikf). For theevents P_(1f) of the first initial time interval i=1, all developmentvalues P_(1kf)f=1, . . . ,F₁ are known. The invention relatesparticularly to a computer program product for carrying out this method.

Experience rating relates in the prior art to value developments ofparameters of events which take place for the first time in a certainyear, the incidence year or initial year, and the consequences of whichpropagate over several years, the so-called development years. Expressedmore generally, the events take place at a certain point in time, anddevelop at given time intervals. Furthermore, the event values of thesame event demonstrate over the different development years ordevelopment time intervals a dependent, retrospective development. Theexperience rating of the values takes place through extrapolation and/orcomparison with the value development of known similar events in thepast.

A typical example in the prior art is the several years' experiencerating based upon damage events, e.g., of the payment status Z or thereserve status R of a damage event at insurance companies or reinsurers.In the experience rating of damage events, an insurance company knowsthe development of every single damage event from the time of the adviceof damage up to the current status or until adjustment. In the case ofexperience rating, the establishment of the classic credibility formulathrough a stochastic model dates from about 30 years ago; since then,numerous variants of the model have been developed, so that today anactual credibility theory may be spoken of. The chief problem in theapplication of credibility formulae consists of the unknown parameterswhich are determined by the structure of the portfolio. As analternative to known methods of estimation, a game-theory approach isalso offered in the prior art, for instance: the actuary or insurancestatistician knows bounds for the parameter, and determines the optimalpremium for the least favorable case. The credibility theory alsocomprises a number of models for reserving for long-term effects.Included are a variety of reserving methods which, unlike thecredibility formula, do not depend upon unknown parameters. Here, too,the prior art comprises methods by stochastic models which describe thegeneration of the data. A series of results exist above all for thechain-ladder method as one of the best known methods for calculatingoutstanding payment claims and/or for extrapolation of the damageevents. The strong points of the chain-ladder method are its simplicity,on the one hand, and, on the other hand, that the method is nearlydistribution-free, i.e., the method is based on almost no assumptions.Distribution-free or non-parametric methods are particularly suited tocases in which the user can give insufficient details or no details atall concerning the distribution to be expected (e.g., Gaussiandistribution, etc.) of the parameter to be developed.

The chain-ladder method means that of an event or loss P_(if) with f=1,2, . . . , F_(i) from incidence year i=1, . . . ,I, values P_(ikf) areknown, wherein P_(ikf) may be, e.g., the payment status or the reservestatus at the end of each handling year k=1, . . . ,K. Therefore, anevent P_(if) consists in this case in a sequence of dotsP _(if)=(P _(i1f) , P _(i2f) , . . . , P _(jKf))

of which the first K+1−i dots are known, and the yet unknown dots(P_(i,K+2−1,f), . . . , P_(i,K,f)) are to be predicted. The values ofthe events P_(if) form a so-called loss triangle or, more generally, anevent-values triangle $\quad\begin{pmatrix}P_{{11f} = {1\quad\ldots\quad F_{1}}} & P_{{12f} = {1\quad\ldots\quad F_{1}}} & P_{{13f} = {1\quad\ldots\quad F_{1}}} & P_{{14f} = {1\quad\ldots\quad F_{1}}} & P_{{15f} = {1\quad\ldots\quad F_{1}}} \\P_{{21f} = {1\quad\ldots\quad F_{2}}} & P_{{22f} = {1\quad\ldots\quad F_{2}}} & P_{{23f} = {1\quad\ldots\quad F_{2}}} & P_{{24f} = {1\quad\ldots\quad F_{2}}} & \quad \\P_{{31f} = {1\quad\ldots\quad F_{3}}} & P_{{32f} = {1\quad\ldots\quad F_{3}}} & P_{{33f} = {1\quad\ldots\quad F_{3}}} & \quad & \quad \\P_{{41f} = {1\quad\ldots\quad F_{4}}} & P_{{42f} = {1\quad\ldots\quad F_{4}}} & \quad & \quad & \quad \\P_{{51f} = {1\quad\ldots\quad F_{5}}} & \quad & \quad & \quad & \quad\end{pmatrix}$

The lines and columns are formed by the damage-incidence years and thehandling years. Generally speaking, e.g., the lines show the initialyears, and the columns show the development years of the examinedevents, it also being possible for the presentation to be different fromthat. Now, the chain-ladder method is based upon the cumulated losstriangles, the entries C_(ij) of which are, e.g., either mere losspayments or loss expenditures (loss payments plus change in the lossreserves). Valid for the cumulated array elements C_(ij) is$C_{ij} = {\sum\limits_{f = 1}^{F_{i}}\quad P_{ijf}}$

from which follows $\quad\begin{pmatrix}{\sum\limits_{f = 1}^{F_{1}}\quad P_{11f}} & {\sum\limits_{f = 1}^{F_{1}}P_{12f}} & {\sum\limits_{f = 1}^{F_{1}}P_{13f}} & {\sum\limits_{f = 1}^{F_{1}}P_{14f}} & {\sum\limits_{f = 1}^{F_{1}}P_{15f}} \\{\sum\limits_{f = 1}^{F_{2}}P_{21f}} & {\sum\limits_{f = 1}^{F_{2}}P_{22f}} & {\sum\limits_{f = 1}^{F_{2}}P_{23f}} & {\sum\limits_{f = 1}^{F_{2}}P_{24f}} & \quad \\{\sum\limits_{f = 1}^{F_{3}}P_{31f}} & {\sum\limits_{f = 1}^{F_{3}}P_{32f}} & {\sum\limits_{f = 1}^{F_{3}}P_{33f}} & \quad & \quad \\{\sum\limits_{f = 1}^{F_{4}}P_{41f}} & {\sum\limits_{f = 1}^{F_{4}}P_{42f}} & \quad & \quad & \quad \\{\sum\limits_{f = 1}^{F_{5}}P_{51f}} & \quad & \quad & \quad & \quad\end{pmatrix}$

From the cumulated values interpolated by means of the chain-laddermethod, the individual event can also again be judged in that a certaindistribution, e.g., typically a Pareto distribution, of the values isassumed. The Pareto distribution is particularly suited to insurancetypes such as, e.g., insurance of major losses or reinsurers, etc. ThePareto distribution takes the following form${\Theta(x)} = {1 - ( \frac{x}{T} )^{\alpha}}$

wherein T is a threshold value, and α is the fit parameter. Thesimplicity of the chain-ladder method resides especially in the factthat for application it needs no more than the above loss triangle(cumulated via the development values of the individual events) and,e.g., no information concerning reporting dates, reserving procedures,or assumptions concerning possible distributions of loss amounts, etc.The drawbacks of the chain-ladder method are sufficiently known in theprior art (see, e.g., Thomas Mack, Measuring the Variability of ChainLadder Reserve Estimates, submitted CAS Prize Paper Competition 1993,Greg Taylor, Chain Ladder Bias, Centre for Actuarial Studies, Universityof Melbourne, Australia, March 2001, pp 3). In order to obtain a goodestimate value, a sufficient data history is necessary. In particular,the chain-ladder method proves successful in classes of business such asmotor vehicle liability insurance, for example, where the differences inthe loss years are attributable in great part to differences in the lossfrequencies since the appraisers of the chain-ladder method correspondto the maximum likelihood estimators of a model by means of modifiedPoisson distribution. Hence caution is advisable, e.g., in the case ofyears in which changes in the loss amount distribution are made (e.g.,an increase in the maximum liability sum or changes in the retention)since these changes may lead to structural failures in the chain-laddermethod. In classes of business having extremely long run-off time—suchas general liability insurance—the use of the chain-ladder methodlikewise leads in many cases to usable results although data, such as areliable estimate of the final loss quota, for example, are seldomavailable on account of the long run-off time. However, the maindrawback of the chain-ladder method resides in the fact that thechain-ladder method is based upon the cumulated loss triangle, i.e.,through the cumulation of the event values of the events having the sameinitial year, essential information concerning the individual lossesand/or events is lost and can no longer be recovered later on.

Known in the prior art is a method of T. Mack (Thomas Mack, SchriftreiheAngewandte Versicherungsmathematik, booklet 28, pp. 31Off., VerlagVersicherungswirtschaft E. V., Karlsruhe 1997) in which the values canbe propagated, i.e., the values in the loss triangle can be extrapolatedwithout loss of the information on the individual events. With the Mackmethod, therefore, using the complete numerical basis for each loss, anindividual IBNER reserve can be calculated (IBNER: Incurred But NotEnough Reported). IBNER demands are understood to mean payment demandswhich are either over the predicted values or are still outstanding. TheIBNER reserve is useful especially for experience rating of excess ofloss reinsurance contracts, where the reinsurer, as a rule, receives therequired individual loss data, at least for the relevant major losses.In the case of the reinsurer, the temporal development of a portfolio ofrisks describes through a risk process in which the damage figures andloss amounts are modeled, whereby in the excess of loss reinsurance,upon the transition from the original insurer to the reinsurer, thephenomenon of the accidental dilution of the risk process arises; on theother s hand, through reinsurance, portfolios of several originalinsurers are combined and risk processes thus caused to overlap. Theeffects of dilution and overlapping have, until now, been examined aboveall for Poisson risk processes. For insurance/reinsurance, experiencerating by means of the Mack method means that of each loss P_(if), withf=1,2, . . . ,F_(i) from incidence year or initial lo year i=1, . . .,I, the payment status Z_(ikf) and the reserve status R_(jkf) at the endof each handling year or development year k=1, . . . , K until thecurrent status (Z_(i,K+1 −i,f), R_(i,K+1−i,f)) is known. A loss P_(if)in this case therefore consists of a sequence of dotsP _(if)=(Z _(i1f) , R _(i1f)), (Z _(i2f) , R _(i2f)), . . . , (Z _(iKf), R _(iKf))

at the payment reserve level, of which the first K+1−i dots are known,and the still unknown dots (Z_(i,K+2−i,f), R_(i,K+2−i,f)), . . . ,(Z_(i,K,f), R_(i,K,f)) are supposed to be predicted. Of particularinterest is, naturally, the final status (Z_(i,K,f), R_(j,K,f)),R_(i,K,f) being equal to 0 in the ideal case, i.e., the claim isregarded as completely settled; whether this can be achieved dependsupon the length K of the development period considered. In the priorart, as e.g. in the Mack method, a claim status (Z_(i,K+1−i,f),R_(i,K+1−i,f)) is continued as was the case in similar claims fromearlier incidence years. In the conventional methods, therefore, it mustbe determined, for one thing, when two claims are “similar,” and foranother thing, what it means to “continue” a claim. Furthermore, besidesthe IBNER reserve thus resulting, it must be determined, in a secondstep, how the genuine belated claims are to be calculated, about whichnothing is as yet known at the present time.

For qualifying the similarity, e.g., the Euclidean distanced((Z,R), ({tilde over (Z)},{tilde over (R)}))=√{square root over((Z−{tilde over (Z)})²+(R−{tilde over (R)})²

is used at the payment reserve level in the prior art. But also with theEuclidean distance there are many possibilities for finding for a givenclaim (P_(i,1,f), P_(i,2,f, . . .) , P_(i,K+1−i,f)) the closest mostsimilar claim of an earlier incidence year, i.e., the claim˜P_(1, . . .) ,˜P_(k)) with k>K+1−i, for which either$\sum\limits_{j = 1}^{K + 1 - i}\quad{{d( {P_{ijf},{\overset{\sim}{P}}_{j}} )}\quad\text{(sum~~of~~all~~previous~~distances)}}$or$\sum\limits_{j = 1}^{K + 1 - i}\quad{{j \cdot {d( {P_{ijf},{\overset{\sim}{P}}_{j}} )}}\quad\text{(weighted~~sum~~of~~all~~distances)}}$or$\max\limits_{1 \leq j \leq {K + 1 - i}}{{d( {P_{ijf},{\overset{\sim}{P}}_{j}} )}\quad\text{(maximum~~distance)}}$or${d( {P_{i,{K + 1 - i},f},{\overset{\sim}{P}}_{K + 1 - j}} )}\quad\text{(current~~distance)}$is  minimal.

In the example of the Mack method, normally the current distance isused. This means that for a claim (P₁, . . . ,P_(k)), the handling ofwhich is known up to the k-th development year, of all other claims({tilde over (P)}_(i), . . . , {tilde over (P)}_(j)), the development ofwhich is known at least up to the development year j≧k+1, the oneconsidered as the most similar is the one for which the current distanced(P_(k),{tilde over (P)}_(k)) is smallest.

The claim (P₁, . . . ,P_(k)) is now continued as is the case for itsclosest-distance “model”({tilde over (P)}₁, . . . , {tilde over(P)}_(k), {tilde over (P)}_(k+1), . . . , {tilde over (P)}_(j)). Fordoing this, there is the possibility of continuing for a single handlingyear (i.e., up to P_(k+1)) or for several development years at the sametime (e.g., up to P_(j)). In methods such as the Mack method, forinstance, one typically first continues for just one handling year inorder to search then again for a new most similar claim, whereby theclaim just continued is continued for a further development year. Thenext claim found may naturally also again be the same one. Forcontinuation of the damage claims, there are two possibilities. Theadditive continuation of P_(k)=(Z_(k),R_(k)){circumflex over (P)} _(k+1)=({circumflex over (Z)} _(k+1) ,{circumflexover (R)} _(k+1))=(Z _(k) +{tilde over (Z)} _(k+1) −{tilde over (P)}_(k) ,R _(k) +{tilde over (R)} _(k+1) −{tilde over (R)} _(k)),

and the multiplicative continuation of P_(k)=(Z_(k),R_(k))${\hat{P}}_{k + 1} = {( {{\hat{Z}}_{k + 1},{\hat{R}}_{k + 1}} ) = {( {{Z_{k} \cdot \frac{{\overset{\sim}{Z}}_{k + 1}}{{\overset{\sim}{Z}}_{k}}},{R_{k} \cdot \frac{{\overset{\sim}{R}}_{k + 1}}{\overset{\sim}{R}}}} ).}}$

It is easy to see that one of the drawbacks of the prior art, especiallyof the Mack method, resides, among other things, in the type ofcontinuation of the damage claims. The multiplicative continuation isuseful only for so-called open claim statuses, i.e., Z_(k)>0, R_(k)>0.In the case of probable claim statuses P_(k)=(0, R_(k)), R_(k)>0, themultiplicative continuation must be diversified since otherwise nocontinuation takes place. Moreover if {tilde over (Z)}_(k)=0 or {tildeover (R)}_(k)=0, a division by 0 takes place. Similarly, if {tilde over(Z)}_(k) or {tilde over (R)}k_(k) is small, the multiplicative methodmay easily lead to unrealistically high continuations. This does notpermit a consistent treatment of the cases. This means that the reserveR_(k) cannot be simply continued in this case. In the same way, anadjusted claim status P_(k)=(Z_(k), 0), Z_(k)>0 can likewise not befurther developed. One possibility is simply to leave it unchanged.However, a revival of a claim is thereby prevented. At best it could becontinued on the basis of the closest adjusted model, which likewisedoes not permit a consistent treatment of the cases. Also with theadditive continuation, probable claim statuses should meaningfully becontinued only on the basis of a likewise probable model in order tominimize the Euclidean distance and to guarantee a correspondingqualification of the similarity. An analogous drawback arises in thecase of adjusted claim statuses, if a revival is supposed to be allowedand negative reserves are supposed to be avoided. Quite generally, theadditive method can easily lead to negative payments and/or reserves. Inaddition, in the prior art, a claim P_(k) cannot be continued if nocorresponding model exists without further assumptions being insertedinto the method. As an example thereof is an open claim P_(k) when inthe same handling year k there is no claim from previous incidence yearsin which {tilde over (P)}_(k) is likewise open. A way out of the dilemmacan be found in that, for this case, P_(k) is left unchanged, i.e.{circumflex over (P)}_(k+1)=P_(k), which of course does not correspondto any true continuation.

Thus, all in all, in the prior art every current claim statusP_(i,K+1−i,f)=(Z_(i,K+1−i,f), R_(i,K+1−i,f)) is further developed stepby step either additively or multiplicatively up to the end ofdevelopment and/or handling after K-development years. Here, in eachstep, the nearest, according to the Euclidean distance in each case,model claim status of the same claim status type (probable, open, oradjusted) is ascertained, and the claim status to be continued iscontinued either additively or multiplicatively according to the furtherdevelopment of the model claim. For the Mack method, it is likewisesensible always to take into consideration as model only actuallyobserved claim developments {tilde over (P)}_(k)→{tilde over (P)}_(k+1)and no extrapolated, i.e., developed claim developments since otherwisea correlation and/or a corresponding bias of the events is not to beavoided. Conversely, however, the drawback is maintained that alreadyknown information of events is lost.

From the construction of the prior art methods it is immediately clearthat the methods can also be applied separately, on the one hand to thetriangle of payments, on the other hand to the triangle of reserves.Naturally, with the way of proceeding described, other possibilitiescould also be permitted in order to find the closest claim status asmodel in each case. However, this would have an effect particularly onthe distribution freedom of the method. It may thereby be said that inthe prior art, the above-mentioned systematic problems cannot beeliminated even by respective modifications, or at best only in thatfurther model assumptions are inserted into the method. Precisely in thecase of complex dynamically non-linear processes, however, as e.g. thedevelopment of damage claims, this is not desirable in most cases. Evenputting aside the mentioned drawbacks, it must still always bedetermined, in the conventional method according to T. Mack, when twoclaims are similar and what it means to continue a claim, whereby,therefore, minimum basic assumptions and/or model assumptions must bemade. In the prior art, however, not only is the choice of Euclideanmetrics arbitrary, but also the choice between the mentionedmultiplicative and additive methods. Furthermore, the estimation oferror is not defined in detail in the prior art. It is true that it isconceivable to define an error, e.g., based on the inverse distance.However, this is not disclosed in the prior art. An important drawbackof the prior art is also, however, that each event must be compared withall the previous ones in order to be able to be continued. Theexpenditure increases linearly with the number of years and linearlywith the number of claims in the portfolio. When portfolios areaggregated, the computing effort and the memory requirement increaseaccordingly.

Neural networks are fundamentally known in the prior art, and are used,for instance, for solving optimization problems, image recognition(pattern recognition), in artificial intelligence, etc. Corresponding tobiological nerve networks, a neural network consists of a plurality ofnetwork nodes, so-called neurons, which are interconnected via weightedconnections (synapses). The neurons are organized in network layers(layers) and interconnected. The individual neurons are activated independence upon their input signals and generate a corresponding outputsignal. The activation of a neuron takes place via an individual weightfactor by the summation over the input signals. Such neural networks areadaptive by systematically changing the weight factors as a function ofgiven exemplary input and output values until the neural network shows adesired behavior in a defined, predictable error span, such as theprediction of output values for future input values, for example. Neuralnetworks thereby exhibit adaptive capabilities for learning and storingknowledge and associative capabilities for the comparison of newinformation with stored knowledge. The neurons (network nodes) mayassume a resting state or an excitation state. Each neuron has aplurality of inputs and just one output which is connected in the inputsof other neurons of the following network layer or, in the case of anoutput node, represents a corresponding output value. A neuron entersthe excitation state when a sufficient number of the inputs of theneuron are excited over a certain threshold value of the neuron, i.e.,if the summation over the inputs reaches a certain threshold value. Inthe weights of the inputs of a neuron and in the threshold value of theneuron, the knowledge is stored through adaptation. The weights of aneural network are trained by means of a learning process (see, e.g., G.Cybenko, “Approximation by Superpositions of a sigmoidal function,”Math. Control, Sig. Syst., 2, 1989, pp. 303-314; M. T. Hagan, M. B.Menjaj, “Training Feed-forward Networks with the Marquardt Algorithm,”IEEE Transactions on Neural Networks, Vol. 5, No. 6, pp. 989-993,November 1994; K. Hornik, M. Stinchcombe, H. White, “MultilayerFeed-forward Networks are Universal Approximators,” Neural Networks, 2,1989, pp. 359-366, etc.).

It is a task of this invention to propose a new system and method forautomated experience rating of events and/or loss reserving which doesnot exhibit the above-mentioned drawbacks of the prior art. Inparticular, an automated, simple, and rational method shall be proposedin order to develop a given claim further with an individual increaseand/or factor so that subsequently all the information concerning thedevelopment of a single claim is available. With the method, as fewassumptions as possible shall be made from the outset concerning thedistribution, and at the same time the maximum possible information onthe given cases shall be exploited.

According to the present invention, this goal is achieved in particularis by means of the elements of the independent claims. Furtheradvantageous embodiments follow moreover from the dependent claims andthe description.

In particular, these goals are achieved by the invention in thatdevelopment values P_(i,k,f) having development intervals k=1, . . . ,Kare assigned to a certain event P_(i,f) of an initial time interval i,wherein K is the last known development interval is, with i=1, . . . ,K,and for the events P_(1,f) all development values P_(1kf) are known, atleast one neural network being used for determining the developmentvalues P_(i,K+2−i,f, . . .) , P_(iKf). In the case of certain events,e.g., the initial time interval can be assigned to an initial year, andthe development intervals can be assigned to development years. Thedevelopment values P_(ikf) of the various events P_(i,f) can, accordingto their initial time interval, be scaled by means of at least onescaling factor. The scaling of the development values P_(ikf) has theadvantage, among others, that the development values are comparable atdiffering points in time. This variant embodiment further has theadvantage, among others, that for the automated experience rating nomodel assumptions need be presupposed, e.g. concerning valuedistributions, system dynamics, etc. In particular, the experiencerating is free of proximation preconditions, such as the Euclideanmeasure, etc., for example. This is not possible in this way in theprior art. In addition, the entire information of the data sample isused, without the data records' being cumulated. The completeinformation concerning the individual events is kept in each step, andcan be called up again at the end. The scaling has the advantage thatdata records of differing initial time intervals receive comparableorders of magnitude, and can thus be better compared.

In one variant embodiment, for determining the development valuesP_(i,K−(i−j)+1,f) (i−1) neural networks N_(ij) are generated iterativelywith j=1, . . . ,(i−1) for each initial time interval and/or initialyear i, the neural network N_(i,j+1) depending recursively on the neuralnetwork N_(ij). For weighting a certain neural network N_(i,j), thedevelopment values P_(p,q,f) can be used, for example, with p=1, . . .,(i−1) and q=1, . . . ,K−(i−j). This variant embodiment has theadvantage, among others, that, as in the preceding variant embodiment,the entire information of the data sample is used, without the datarecords' being cumulated. The complete information concerning theindividual events is maintained in each step, and can be called up againat the end. By means of a minimizing of a globally introduced error, thenetworks can be additionally optimized.

In another variant embodiment, the neural networks N_(i,j) areidentically trained for identical development years and/or developmentintervals j, the neural network N_(i+1,j=i) being generated for aninitial time interval and/or initial year i+1, and all other neuralnetworks N_(i+1,j<i) being taken over from previous initial timeintervals and/or initial years. This variant embodiment has theadvantage, among others, that only known data are used for theexperience rating, and certain data are not used further by the system,whereby the correlation of the errors or respectively of the data isprevented.

In a still different variant embodiment, events P_(i,f) with initialtime interval i<1 are additionally used for determination, alldevelopment values P_(i<1,k,f) for the events P_(i<1,f) being known.This variant embodiment has the advantage, among others, that by meansof the additional data records the neural networks can be betteroptimized, and their errors can be minimized.

In a further variant embodiment, for the automated experience ratingand/or loss reserving, development values P_(i,k,f) with developmentintervals k=1, . . . ,K are stored assigned to a certain event P_(i,f)of an initial time interval i, in which i=1, . . . ,K, and K is the lastknown development interval, and in which for the first initial timeinterval all development values P_(1,k,f) are known, for each initialtime interval i=2, . . . ,K by means of iterations j=1 , . . . (i−1)upon each iteration j in a first step a neural network N_(ij) beinggenerated having an input layer with K−(i−j) input segments and anoutput layer, which input segments comprise at least one input neuronand are assigned to a development value P_(i,k,f), in a second step theneural network N_(i,j) with the available events P_(i,f) of all initialtime intervals m=1 . . . ,(i−1) being weighted by means of thedevelopment values P_(m,1 . . . K−(i−j),f) as input andP_(m,1 . . . K−(i−j)+1,f) as output, and in a third step by means of theneural network N_(i,j) the output values O_(i,f) being determined forall events P_(i,f) of the initial time interval i, the output valueO_(i,f) being assigned to the development value P_(i,K−(i−j)+1,f) of theevent P_(i,f), and the neural network N_(ij) being dependent recursivelyon the neural network N_(i,j+1). In the case of certain events, e.g.,the initial time interval can be assigned to an initial year, and thedevelopment intervals assigned to development years. This variantembodiment has the same advantages, among others, as the precedingvariant embodiments.

In one variant embodiment, a system comprises neural networks N_(i) eachhaving an input layer with at least one input segment and an outputlayer, which input and output layer comprises a plurality of neuronswhich are interconnected in a weighted way, the neural networks N_(i)being iteratively producible by means of a data processing unit throughsoftware and/or hardware, a neural network N_(i+1) depending recursivelyon the neural network N_(i), and each network N_(i+1) comprising in eachcase one input segment more than the network N_(i), each neural networkN_(i), beginning with the neural network N₁, being trainable by means ofa minimization module through minimizing of a locally propagated error,and the recursive system of neural networks being trainable by means ofa minimization module through minimization of a globally propagatederror based upon the local errors of the neural networks N_(i). Thisvariant embodiment has the advantage, among others, that the recursivelygenerated neural networks can be additionally optimized by means of theglobal error. Among other things, it is the combination of the recursivegeneration of the neural network structure with a double minimization bymeans of locally propagated error and globally propagated error whichresults in the advantages of the variant embodiment.

In another variant embodiment, the output layer of the neural networkN_(i) is connected in an assigned way to at least one input segment ofthe input layer of the neural network N_(i+1). This variant embodimenthas the advantage, among others, that the system of neural networks canin turn be interpreted as a neural network. Thus partial networks of awhole network may lo be locally weighted, and also in the case of globallearning can be checked and monitored in their behavior by the system bymeans of the corresponding data records. This has not been possibleuntil now in this way in the prior art.

At this point, it shall be stated that besides the method according tothe invention, the present invention also relates to a system forcarrying out this method. Furthermore, it is not limited to the saidsystem and method, but equally relates to recursively nested systems ofneural networks and a computer program product for implementing themethod according to the invention.

Variant embodiments of the present invention are described below on thebasis of examples. The examples of the embodiments are illustrated bythe following accompanying figures:

FIG. 1 shows a block diagram which reproduces schematically the trainingand/or determination phase or presentation phase of a neural network fordetermining the event value P_(2,5,f) of an event P_(f) in an upper 5×5matrix, i.e., with K=5. The dashed line T indicates the training phase,and the solid line R the determination phase after learning.

FIG. 2 likewise shows a block diagram which, like FIG. 1, reproducesschematically the training and/or determination phase of a neuralnetwork for determining the event value P_(3,4,f) for the third initialyear.

FIG. 3 shows a block diagram which, like FIG. 1, reproducesschematically the training and/or determination phase of a neuralnetwork for determining the event value P_(3,5,f) for the third initialyear.

FIG. 4 shows a block diagram which schematically shows only the trainingphase for determining P_(3,4,f) and P_(3,5,f), the calculated valuesP_(3,4,f) being used for training the network for determining P_(3,5,f).

FIG. 5 shows a block diagram which schematically shows the recursivegeneration of neural networks for determining the values in line 3 of a5×5 matrix, two networks being generated.

FIG. 6 shows a block diagram which schematically shows the recursivegeneration of neural networks for determining the values in line 5 of a5×5 matrix, four networks being generated.

FIG. 7 shows a block diagram which likewise shows schematically a systemaccording to the invention, the training basis being restricted to theknown event values A_(ij).

FIGS. 1 to 7 illustrate schematically an architecture which may be usedfor implementing the invention. In this embodiment example, a certainevent P_(i,f) of an initial year i includes development values P_(ikf)for the automated experience rating of events and/or loss reserving. Theindex f runs over all events P_(i,f) for a certain initial year i withf=1, . . . ,F_(j). The development value P_(ikf)=(Z_(ikf),R_(ikf), . . .) is any vector and/or n-tuple of development parameters Z_(ikf),R_(ikf), . . . , which is supposed to be developed for an event. Thus,for example, in the case of insurance for a damage event P_(ikf),Z_(ikf) can be the payment status, R_(ikf) the reserve status, etc. Anydesired further relevant parameters for an event are conceivable withoutthis affecting the scope of protection of the invention. The developmentyears k proceed from k=1, . . . ,K, and the initial years I=1, . . . ,I. K is the last known development year. For the first initial year i=1,all development values P_(1kf) are given. As already indicated, for thisexample the number of initial years I and the number of developmentyears K are supposed to be the same, i.e., I=K. However, it is quiteconceivable that I≠K, without the method or the system being therebylimited. P_(ikf) is therefore an n-tuple consisting of the sequence ofdots and/or matrix elements(Z_(ikn), R_(ikn), . . . ) with k=1, 2, . . . , K

With I=K the result is thereby a quadratic upper triangular matrixand/or block triangular matrix for the known development values P_(ikf)$\quad\begin{pmatrix}P_{{11f} = {1\quad\ldots\quad F_{1}}} & P_{{12f} = {1\quad\ldots\quad F_{1}}} & P_{{13f} = {1\quad\ldots\quad F_{1}}} & P_{{14f} = {1\quad\ldots\quad F_{1}}} & P_{{15f} = {1\quad\ldots\quad F_{1}}} \\P_{{21f} = {1\quad\ldots\quad F_{2}}} & P_{{22f} = {1\quad\ldots\quad F_{2}}} & P_{{23f} = {1\quad\ldots\quad F_{2}}} & P_{{24f} = {1\quad\ldots\quad F_{2}}} & \quad \\P_{{31f} = {1\quad\ldots\quad F_{3}}} & P_{{32f} = {1\quad\ldots\quad F_{3}}} & P_{{33f} = {1\quad\ldots\quad F_{3}}} & \quad & \quad \\P_{{41f} = {1\quad\ldots\quad F_{4}}} & P_{{42f} = {1\quad\ldots\quad F_{4}}} & \quad & \quad & \quad \\P_{{51f} = {1\quad\ldots\quad F_{5}}} & \quad & \quad & \quad & \quad\end{pmatrix}$

again with f=1, . . . ,F_(i) going over all events for a certain initialyear. Thus, the lines of the matrix are assigned to the initial yearsand the columns of the matrix to the development years. In theembodiment example, P_(ikf) shall be limited to the example of damageevents with insurance since in particular the method and/or the systemis very suitable, e.g., for the experience rating of insurance contractsand/or excess loss reinsurance contracts. It must be emphasized that thematrix elements P_(ikf) may themselves again be vectors and/or matrices,whereupon the above matrix becomes a corresponding block matrix. Themethod and system according to the invention is, however, suitable forexperience rating and/or for extrapolation of time-delayed non-linearprocesses quite generally. That being said, P_(ikf) is a sequence ofdots(Z_(ikn), R_(ikn), . . . ) with k=1, 2, . . . , K

at the payment reserve level, the first K+1−i dots of which are known,and the still unknown dots (Z_(i,K+2−i,f), R_(i,K+2−i,f)), . . . ,(Z_(iKf), R_(iKf)), are supposed to be predicted. If, for this example,P_(ikf) is divided into payment level and reserve level, the resultobtained analogously for the payment level is the triangular matrix$\quad\begin{pmatrix}Z_{11f} & Z_{12f} & Z_{13f} & Z_{14f} & Z_{15f} \\Z_{21f} & Z_{22f} & Z_{23f} & Z_{24f} & \quad \\Z_{31f} & Z_{32f} & Z_{33f} & \quad & \quad \\Z_{41f} & Z_{42f} & \quad & \quad & \quad \\Z_{51f} & \quad & \quad & \quad & \quad\end{pmatrix}$

and for the reserve level the triangular matrix $\quad\begin{pmatrix}R_{11f} & R_{12f} & R_{13f} & R_{14f} & R_{15f} \\R_{21f} & R_{22f} & R_{23f} & R_{24f} & \quad \\R_{31f} & R_{32f} & R_{33f} & \quad & \quad \\R_{41f} & R_{42f} & \quad & \quad & \quad \\R_{51f} & \quad & \quad & \quad & \quad\end{pmatrix}$

Thus, in the experience rating of damage events, the development of eachindividual damage event f_(i) is known from the point in time of thereport of damage in the initial year i until the current status (currentdevelopment year k) or until adjustment. This information may be storedin a database, which database may be called up, e.g., via a network bymeans of a data processing unit. However, the database may also beaccessible directly via an internal data bus of the system according tothe invention, or be read out otherwise.

In order to use the data in the example of the claims, the triangularmatrices are scaled in a first step, i.e., the damage values must firstbe made comparable in relation to the assigned time by means of therespective inflation values. The inflation index may likewise be readout of corresponding databases or entered in the system by means ofinput units. The inflation index for a country may, for example, looklike the following: Year Inflation Index (%) Annual Inflation Value 1989100 1.000 1990 105.042 1.050 1991 112.920 1.075 1992 121.429 1.075 1993128.676 1.060 1994 135.496 1.053 1995 142.678 1.053 1996 148.813 1.0431997 153.277 1.030 1998 157.109 1.025 1999 163.236 1.039 2000 171.3981.050 2001 177.740 1.037 2002 185.738 1.045

Further scaling factors are just as conceivable, such as regionaldependencies, ect., for example. If damage events are compared and/orextrapolated in more than one country, respective national dependenciesare added. For the general, non-insurance-specific case, the scaling mayalso related to dependencies such as e.g. mean age of populations ofliving beings, influences of nature, etc. etc.

For the automated determination of the development valuesP_(i,K+2−i,f, . . . ,) P_(i,K,f)=(Z_(i,K+2−i,f), R_(i,K+2−i,K,f)), . . ., (Z_(i,K,f), R_(i,K,f)), the system and/or method comprises at leastone neural network. As neural networks, e.g., conventional static and/ordynamic neural networks may be chosen, such as, for example,feed-forward (heteroassociative) networks such as a perceptron or amulti-layer preceptron (MLP), but also other network structures, suchas, e.g., recurrent network structures, are conceivable. The differingnetwork structure of the feed-forward networks in contrast to networkswith feedback (recurrent networks) determines the way in whichinformation is processed by the network. in the case of a static neuralnetwork, the structure is supposed to ensure the replication of saticcharacteristic fields with sufficient approximation quality. For thisembodiment example let multilayer perceptrons be chosen as an example.An MLP consists of a number of neuron layers having at least one inputlayer and one output layer. The structure is directed strictly forward,and belongs to the group of feed-forward networks. Neural networks quitegenerally map an m-dimensional input signal onto an n-dimensional outputsignal. The information to be processed is, in the feed-forward networkconsidered here, received by a layer having input neurons, the inputlayer. The input neurons process the input signals, and forward them viaweighted connections, so-called synapses, to one or more hidden neuronlayers, the hidden layers. From the hidden layers, the signal istransmitted, likewise by means of weighted synapses, to neurons of anoutput layer which, in turn, generate the output signal of the neuralnetwork. In a forward directed, completely connected MLP, each neuron ofa certain layer is connected to all neurons of the following layer. Thechoice of the number of layers and neurons (network nodes) in aparticular layer is, as usual, to be adapted to the respective problem.The simplest possibility is to find out the ideal network structureempirically. In so doing, it is to be heeded that if the number ofneurons chosen is too large, the network, instead of learning, workspurely image-forming, while with too small a number of neurons it comesto correlations of the mapped parameters. Expressed differently, thefact is that if the number of neurons chosen is too small, the functioncan possibly not be represented. However, upon increasing the number ofhidden neurons, the number of independent variables in the errorfunction also increases. This leads to more local minima and to thegreater probability of landing in precisely one of these minima. In thespecial case of back propagation, this problem can be at leastminimized, e.g. by means of simulated annealing. In simulated annealing,a probability is assigned to the states of the network. In analogy tothe cooling of liquid material from which crystals are produced, a highinitial temperature T is chosen. This is gradually reduced, the lowerthe slower. In analogy to the formation of crystals from liquid, it isassumed that if the material is allowed to cool too quickly, themolecules do not arrange themselves according to the grid structure. Thecrystal becomes impure and unstable at the locations affected. In orderto present this, the material is allowed to cool down so slowly that themolecules still have enough energy to jump out of local minimum. In thecase of neural networks, nothing different is done: additionally, themagnitude T is introduced in a slightly modified error function. In theideal case, this then converges toward a global minimum.

For the application to experience rating, neural networks having an atleast three-layered structure have proved useful in MLP. That means thatthe networks comprise at least one input layer, a hidden layer, and anoutput layer. Within each neuron, the three processing steps ofpropagation, activation, and output take place. As output of the i-thneuron of the k-th layer there results$o_{i}^{k} = {f_{i}^{k}( {{\sum\limits_{j}{w_{i,j}^{k} \cdot o_{i,j}^{k - 1}}} + b_{i,j}^{k}} )}$

whereby e.g. for k=2, as range of the controlled variable j=1,2, . . .,N₁ is valid; designated with N₁ is the number of neurons of the layerk−1, w as weight, and b as bias (threshold value). Depending upon theapplication, the bias b may be chosen the same or different for allneurons of a certain layer. As activation function, e.g., alog-sigmoidal function may be chosen, such as${f_{i}^{k}(\xi)} = \frac{1}{1 + {\mathbb{e}}^{- \xi}}$

The activation function (or transfer function) is inserted in eachneuron. Other activation functions such as tangential functions, etc.,are, however, likewise possible according to the invention. With theback-propagation method, however, it is to be heeded that adifferentiable activation function <is used>, such as e.g. a sigmoidfunction, since this is a prerequisite for the method. That is,therefore, binary activation function as e.g.${f(x)}:=\{ \begin{matrix}{{1\quad{if}\quad x} > 0} \\{{0\quad{if}\quad x} \leq 0}\end{matrix} $

do not work for the back-propagation method. In the neurons of theoutput layer, the outputs of the last hidden layer are summed up in aweighted way. The activation function of the output layer may also belinear. The entirety of the weightings W_(i,j) ^(k) and bias B_(i,j)^(k) combined in the parameter—and/or weighting matrices determine thebehavior of the neural network structureW ^(k)=(w _(i,j) ^(k))εR ^(N·N) ^(k)

Thus the result iso ^(k) =B ^(k) +W ^(k)·(1+e ^(−(B) ^(k−1) ^(+W) ^(k−1) ^(·u)))⁻¹

The way in which the network is supposed to map an input signal onto anoutput signal, i.e., the determination of the desired weights and biasof the network, is achieved by training the network by means of trainingpatterns. The set of training patterns (index μ) consists of the inputsignal and an output signalY^(μ)=└y₁ ^(μ), y₂ ^(μ), . . . ,y_(N) ₁ ^(μ)┘

and an output signalU^(μ)=└u₁ ^(μ), u₂ ^(μ), . . . ,u_(N) ₁ ^(μ)┘

In this embodiment example with the experience rating of claims, thetraining patterns comprise the known events P_(i,f) with the knowndevelopment values P^(ikf) for all k, f, and i. Here the developmentvalues of the events to be extrapolated may naturally not be used fortraining the neural networks since the output value corresponding tothem is lacking.

At the start of the learning operation, the initialization of theweights of the hidden layers, thus in this exemplary example of theneurons, is carried out, e.g., by means of a log-sigmoidal activationfunction, e.g. according to Nguyen-Widrow (D. Nguyen, B. Widrow,“Improving the Learning Speed of 2-Layer Neural Networks by ChoosingInitial Values of Adaptive Weights,” International Joint Conference ofNeural Networks, Vol. 3, pp. 21-26, July 1990). If a linear activationfunction has been chosen for the neurons of the output layer, theweights may be initialized, e.g., by means of a symmetrical randomnumber generator. For training the network, various prior art learningmethods may be used, such as e.g. the back-propagation method, learningvector quantization, radial basis function, Hopfield algorithm, orKohonen algorithm, etc. The task of the training method consists indetermining the synapses weights w_(i,j) and bias b_(i,j) within theweighting matrix W and/or the bias matrix B in such a way that the inputpatterns Y^(μ) are mapped onto the corresponding output patterns U^(μ).For judging the learning stage, the absolute quadratic error${Err} = {{\frac{1}{2}{\sum\limits_{\mu = 1}^{p}{\sum\limits_{\lambda = 1}^{m}( {u_{{eff},\lambda}^{\mu} - u_{{soll},\lambda}^{\mu}} )^{2}}}} = {\sum\limits_{\mu = 1}^{p}{Err}^{\mu}}}$

may be used, for example. The error Err then takes into considerationall patterns P_(ikf) of the training basis in which the actual outputsignals U_(eff) ^(μ) show the target reactions U_(soll) ^(μ) specifiedin the training basis. For this embodiment example, the back-propagationmethod shall be chosen as the learning method. The back-propagationmethod is a recursive method for optimizing the weight factors w_(ij).In each learning step, an input pattern Y^(μ) is randomly chosen andpropagated through the network (forward propagation). By means of theabove-described error function Err, the error Err^(μ) on the presentedinput pattern is determined from the output signal generated by thenetwork by means of the target reaction U_(soll) ^(μ) specified in thetraining basis. The modifications of the individual weights w_(ij) afterthe presentation of the μ-th training pattern are thereby proportionalto the negative partial derivation of the error Err^(μ) according to theweight w_(ij) (so-called gradient descent method)${\Delta\quad w_{i,j}^{\mu}} \approx \frac{\partial E^{\mu}}{\partial w_{i,j}}$

With the aid of the chain rule, the known adaptation specifications,known as back-propagation rule, for the elements of the weighting matrixin the presentation of the μ-th training pattern can be derived from thepartial derivation.Δw_(i,j) ^(μ)≡s·δ_(i) ^(μ)·u_(eff,j) ^(μ)withδ_(i) ^(μ) =f ¹(ξ_(i) ^(μ))·(u _(soll,i) ^(μ) −u _(eff,1) ⁸² )

for the output layer, and$\delta_{i}^{\mu} = {{f^{1}( \xi_{i}^{\mu} )} \cdot {\sum\limits_{k}^{K}{\delta_{k}^{\mu}w_{k,i}}}}$

for the hidden layers, respectively. Here the error is propagatedthrough the network in the opposite direction (back propagation)beginning with the output layer and divided among the individual neuronsaccording to the costs-by-cause principle. The proportionality factor sis called the learning factor. During the training phase, a limitednumber of training patterns is presented to a neural network, whichpatterns characterize precisely enough the map to be learned. In thisembodiment example, with the experience rating of damage events, thetraining patterns may comprise all known events P_(i,f) with the knowndevelopment values P_(ikf) for all k, f, and i. But a selection of theknown events P_(i,f) is also conceivable. If thereafter the network ispresented with an input signal which does not agree exactly with thepatterns of the training basis, the network interpolates or extrapolatesbetween the training patterns within the scope of the learned mappingfunction. This property is called the generalization capability of thenetworks. It is characteristic of neural networks that neural networkspossess good error tolerance. This is a further advantage as comparedwith the prior art systems. Since neural networks map a plurality of(partially redundant) input signals upon the desired output signal(s),the networks prove to be robust toward the failure of individual inputsignals and/or toward signal noise. A further interesting property ofneural networks is their adaptive capability. Hence it is possible inprinciple to have a once-trained system relearn or adaptpermanently/periodically during operation, which is likewise anadvantage as compared with the prior art systems. For the learningmethod, other methods may naturally also be used, such as e.g. a methodaccording to Levenberg-Marquardt (D. Marquardt, “An Algorithm for leastsquare estimation of non-linear Parameters,” J.Soc.Ind.Appl.Math., pp.431-441, 1963, as well as M. T. Hagan, M. B. Menjaj, “TrainingFeed-forward Networks with the Marquardt Algorithm,” IEEE—Transactionson Neural Networks, Vol. 5, No. 6, pp.989-993, November 1994). TheLevenberg-Marquardt method is a combination of the gradient method andthe Newton method, and has the advantage that it converges faster thanthe above-mentioned back-propagation method, but needs a greater storagecapacity during the training phase.

In the embodiment example, for determining the development valuesP_(i,K−(i−j)+1,f) for each initial year i (i−1) neural networks N_(i,j)are generated iteratively. j indicates, for a certain initial year i,the number of iterations, with j=1, . . . ,(i−1). Thereby, for the i-stinitial year i−1, neural networks N_(i,j) are generated. The neuralnetwork N_(ij+1) depends recursively here from the neural networkN_(i,j). For weighting, i.e., for training, a certain neural networkN_(i,j), e.g., all development values P_(p,q,f) with p=1, . . . ,(i−1)and q=1, . . . ,K−(i−j) of the events or losses P_(pq) may be used. Alimited selection may also be useful, however, depending upon theapplication. The data of the events P_(pq) may, for instance, asmentioned be read out of a database and presented to the system via adata processing unit. A calculated development value P_(i,k,f) may,e.g., be assigned to the respective event P_(i,f) of an initial year iand itself be presented to the system for determining the nextdevelopment value (e.g., P_(i,k+1,f)) (FIGS. 1 to 6), or the assignmenttakes place only after the end of the determination of all developmentvalues P sought (FIG. 7).

In the first case (FIGS. 1 to 6), as described, development valuesP_(i,k,f) with development year k=1 , . . . ,K are assigned to a certainevent P_(i,f) of an initial year i, whereby for the initial years i=1, .. . ,K, and K are the last known development year. For the first initialyear i=1, all development values P_(1,k,f) are known. For each initialyear i=2, . . . ,K by means of iterations j=1, . . . ,(i−1), upon eachiteration j, in a first step, a neural network N_(i,j) is generated withan input layer with K−(i,j) input segments and an output layer. Eachinput segment comprises at least one input neuron and/or at least asmany input neurons to obtain the input signal for a development valueP_(i,k,f). The neural networks are automatically generated by thesystem, and may be implemented by means of hardware or software. In asecond step, the neural network N_(ij) with the available events E_(i,f)of all initial years m=1, . . . ,(i−1) are weighted by means of thedevelopment values P_(m,1 . . . K−(i−j),f) as input andP_(m,1 . . . K−(i−j)+1,f) as output. In a third step, by means of theneural network N_(i,j), the output values O_(i,f) are determined for allevents P_(i,f) of the initial year i, the output value O_(i,f) beingassigned to the development value P_(i,K−(i−j)+1,f) of the eventP_(i,f), and the neural network N_(i,j) depending recursively on theneural network N_(i,j+1). FIG. 1 shows the training and/or presentationphase of a neural network for determining the event value P_(2,5,f) ofan event P_(f) in an upper 5×5 matrix, i.e., at K+5. The dashed line Tindicates the training phase, and the solid line R indicates thedetermination phase after learning. FIG. 2 shows the same thing for thethird initial year for determining P_(3,4,f) (B₃₄), and FIG. 3 fordetermining P_(3,5,f). FIG. 4 shows only the training phase fordetermining P_(3,4,f) and P_(3,5,f), the generated values P_(3,4,f)(B₃₄) being used for training the network for determining P_(3,5,f·).A_(ij) indicates the known values in the figures, while B_(ij) displayscertain values by means of the networks. FIG. 5 shows the recursivegeneration of the neural networks for determining the values in line 3of a 5×5 matrix, i−1 networks being generated, thus two. FIG. 6, on theother hand, shows the recursive generation of the neural networks fordetermining the values in line 3 of a 5×5 matrix, i−1 networks againbeing generated, thus four.

It is important to point out that, as an embodiment example, theassignment of the event values B_(ij) generated by means of the systemmay also take place only after determination of all sought developmentvalues P. The newly determined values are then not available as inputvalues for determination of further event values. FIG. 7 shows such amethod, the training basis being limited to the known event valuesA_(ij). In other words, the neural networks N_(ij) may be identical forthe same j, the neural network N_(i+1,j=i) being generated for aninitial time interval i+1, and all other neural networks N_(i+1,j<i)corresponding to networks of earlier initial time intervals. This meansthat a network, which was once generated for calculation of a particularevent value P_(ij), is further used for all event values with an initialyear a>i for the values P_(ij) with same j.

In the case of the insurance cases discussed here, different neuralnetworks may be trained, e.g. based on different data. For example, thenetworks may be trained based on the paid claims, based on the incurredclaims, based on the paid and still outstanding claims (reserves) and/orbased on the paid and incurred claims. The best neural network for eachcase may be determined e.g. by means of minimizing the absolute meanerror of the predicted values and the actual values. For example, theratio of the mean error to the mean predicted value (of the knownclaims) may be applied to the predicted values of the modeled values inorder to obtain the error. For the case where the predicted values ofthe previous initial years is <sic. are> co-used for calculation of thefollowing initial years, the error must of course be correspondinglycumulated. This can be achieved e.g. in that the square root of the sumof the squares of the individual errors of each model is used.

To obtain a further estimate of the quality and/or training state of theneural networks, e.g. the predicted values can also be fitted by meansof the mentioned Pareto distribution. This estimation can also be usedto determine e.g. the best neural network from among neural networks(e.g. paid claims, outstanding claims, etc.) trained with different setsof data (as described in the last paragraph). It thereby follows withthe Pareto distribution$\chi^{2} = {\sum( \frac{{O(i)} - {T(i)}}{E(i)} )^{2}}$ withT(i) = Th((1 − P(i))^((−1/α)))

whereby α of the fit parameters, Th of the threshold parameters(threshold value), T(i) of the theoretical value of the i-th paymentdemand, O(i) of the observed value of the i-th payment demand, E(i) isthe error of the i-th payment demand and P(i) is the cumulatedprobability of the i-th payment demand with${P(1)} = ( \frac{1}{2n} )$ and${P( {i + 1} )} = {{P(i)} + \frac{1}{n}}$

and n the number of payment demands. For the embodiment example here,the error of the systems based on the proposed neural networks wascompared with the chain ladder method with reference to vehicleinsurance data. The networks were compared once with the paid claims andonce with the incurred claims. In order to compare the data, theindividual values were cumulated in the development years. The directcomparison showed the following results for the selected example dataper 1000 System Based on Neural Networks Chain Ladder Method InitialPaid Claims Incurred Claims Paid Claims Incurred Claims Year (cumulatedvalues) (cumulated values) (cumulated values) (cumulated values) 1996369.795 ± 5.333  371.551 ± 6.929 387.796 ± n/a   389.512 ± n/a   1997769.711 ± 6.562  789.997 ± 8.430 812.304 ± 0.313  853.017 ± 15.704 1998953.353 ± 40.505  953.353 ± 30.977 1099.710 ± 6.522  1042.908 ± 32.551 1999 1142.874 ± 84.947  1440.038 ± 47.390 1052.683 ± 138.221 1385.249 ±74.813  2000 864.628 ± 99.970 1390.540 ± 73.507 1129.850 ± 261.2541285.956 ± 112.668 2001 213.330 ± 72.382  288.890 ± 80.617  600.419 ±407.718 1148.555 ± 439.112

The error shown here corresponds to the standard deviation, i.e. theσ₁-error, for the indicated values. In particular for later initialyears, i.e. initial years with greater i, the system based on neuralnetworks shows a clear advantage in the determination of values comparedto the prior art methods in that the errors remain substantially stable.This is not the case in the state of the art since the error there doesnot increase proportionally for increasing i. For greater initial yearsi, a clear deviation in the amount of the cumulated values isdemonstrated between the chain ladder values and those which wereobtained with the method according to the invention. This deviation isbased on the fact that in the chain ladder method the IBNYR (IncurredBut Not Yet Reported) losses have been additionally taken into account.The IBNYR damage events would have to be added to the above-shown valuesof the method according to the invention. For example, for calculationof the portfolio reserves, the IBNYR damage events can be taken intoaccount by means of a separate development (e.g. chain ladder). Inreserving for individual losses or in determining loss amountdistributions, the IBNYR damage events play no role, however.

LIST OF REFERENCE SYMBOLS

T training phase

L determination phase after learning

A_(ij) known event values

B_(ij) event values generated by means of the system

1.-23. (canceled)
 24. Computer-based system for automated experiencerating and/or loss reserving, a certain event P_(if) of an initial timeinterval i including development values P_(ikf) of the developmentintervals k=1, . . . ,K, K being the last known development intervalwith i=1, . . . , K, and all development values P_(1kf) being known,characterized in that the system for automated determination of thedevelopment values P_(i,K+2−i,f), . . . ,P_(i,K,f) comprises at leastone neural network, the system for determination of the developmentvalues P_(i,K+2−i,f), . . . ,P_(i,K,f) of an event P_(i,f)(i−1)comprising iteratively generated neural networks N_(ij) for each initialtime interval i with j=1, . . . ,(i−1), and the neural network N_(ij+1)depending recursively on the neural network N_(ij).
 25. Computer-basedsystem according to claim 24, characterized in that for the events theinitial time interval corresponds to an initial year, and thedevelopment intervals correspond to development years. 26.Computer-based system according to claim 24, characterized in thattraining values for weighting a particular neural network N_(ij)comprise the development values P_(p,q,f) with p=1, . . . ,(i−1) andq=1, . . . ,K−(i−j).
 27. Computer-based system according to claim 24,characterized in that the neural networks N_(ij) for the same j areidentical, the neural network N_(i+1,j=i) being generated for an initialtime interval i+1, and all other neural networks N_(i+1,j<i)corresponding to networks of earlier initial time intervals. 28.Computer-based system according to claim 24, characterized in that thesystem further comprises events P_(i,f) with initial time interval i<1,all development values P_(i<1,k,f) being known for the events P_(i<1,f).29. Computer-based system according to claim 24, characterized in thatthe system comprises at least one scaling factor by means of which thedevelopment values P_(ikf) of the different events P_(i,f) are scalableaccording to their initial time interval.
 30. Computer-based method forautomated experience rating and/or loss reserving, development valuesP_(ikf) with development intervals k=1, . . . , K being assigned to acertain event P_(if) of an initial time interval i, K being the lastknown development interval with i=1, . . . , K, and all developmentvalues P_(1kf) being known for the events P_(1,f), characterized in thatat least one neural network is used for determination of the developmentvalues P_(i,K+2−i,f), . . . ,P_(i,K,f), neural networks N_(ij) beinggenerated iteratively (i−1) for each initial time interval i with j=1, .. . ,(i−1), for determination of the development valuesP_(i,K−(i−j)+1,f), and the neural network N_(i,j+1) dependingrecursively on the neural network N_(ij).
 31. Computer-based methodaccording to claim 30, characterized in that for the events the initialtime interval is assigned to the initial year, and the developmentintervals are assigned to development years.
 32. Computer-based methodaccording to claim 30, characterized in that for weighting a particularneural network Ni,j, the development values P_(p,q,f) with p=1, . . . ,(i−1) and q=1, . . . , K−(i−j) are used.
 33. Computer-based methodaccording to claim 30, characterized in that the neural networks N_(ij)for same j are trained identically, the neural network N_(i+1,j=i) beinggenerated for an initial time interval i+1, and all other neuralnetworks N_(i+1,j<i) of earlier initial time intervals being taken over.34. Computer-based method according to claim 30, characterized in thatused in addition for determination are events P_(i,f) with initial timeinterval i<1, all development values P_(i<1,k,f) being known for theevents P_(i<1,f).
 35. Computer-based method according to claim 30,characterized in that by means of at least one scaling factor thedevelopment values P_(ikf) of the different events P_(i,f) are scaledaccording to their initial time interval.
 36. Computer-based method forautomated experience rating and/or loss reserving, development valuesP_(i,k,f) with development intervals k=1, . . . , K being storedassigned to a certain event P_(i,f) of an initial time interval i,whereby i=1, . . . , K and K is the last known development interval, andwhereby all development values P_(1,k,f) are known for the first initialtime interval, characterized in that, in a first step, for each initialtime interval i=2, . . . ,K, by means of iterations j=1, . . . ,(i−1),at each iteration j, a neural network N_(ij) is generated with an inputlayer with K−(i−j) input segments and an output layer, each inputsegment comprising at least one input neuron and being assigned to adevelopment value P_(i,k,f), in that, in a second step, the neuralnetwork N_(ij) is weighted with the available events P_(i,f) of allinitial time intervals m=1, . . . ,(i−1) by means of the developmentvalues P_(m, . . . K−(i−j),f) as input and P_(m,1 . . . K−(i−j)+1,f) asoutput, and in that, in a third step, by means of the neural networkN_(ij) the output values O_(i,f) for all events P_(i,f) of the initialyear i are determined, the output value O_(i,f) being assigned to thedevelopment value P_(i,K−(i−j)+1,f) of the event P_(i,f), and the neuralnetwork N_(ij) depending recursively on the neural network N_(ij+1). 37.Computer-based method according to claim 36, characterized in that forthe events the initial time interval is assigned to an initial year, andthe development intervals are assigned to development years.
 38. Systemof neural networks, which neural networks N_(i) each comprise an inputlayer with at least one input segment and an output layer, the inputlayer and output layer comprising a multiplicity of neurons which areconnected to one another in a weighted way, characterized in that theneural networks N_(i) are able to be generated iteratively usingsoftware and/or hardware by means of a data processing unit, a neuralnetwork N_(i+1)depending recursively on the neural network N_(i), andeach network N_(i+1)comprising in each case one input segment more thanthe network N_(i), in that, beginning at the neural network N_(i), eachneural network N_(i) is trainable by means of a minimization module byminimizing a locally propagated error, and in that the recursive systemof neural networks is trainable by means of a minimization module byminimizing a globally propagated error based on the local error of theneural network N_(i).
 39. System of neural networks according to claim38, characterized in that the output layer of the neural network N_(i)is connected to at least one input segment of the input layer of theneural network N_(i+1) in an assigned way.
 40. Computer program productwhich comprises a computer-readable medium with computer program codemeans contained therein for control of one or more processors of acomputer-based system for automated experience rating and/or lossreserving, development values P_(i,k,f) with development intervals k=1,. . . , K being stored assigned to a certain event P_(i,f) of an initialtime interval i, whereby i=1, . . . , K, and K is the last knowndevelopment interval, and all development values P_(1,k,f) being knownfor the first initial time interval i=1, characterized in that by meansof the computer program product at least one neural network is able tobe generated using software and is usable for determination of thedevelopment values P_(i,K+2−i,f), . . . , P_(i,K,f), whereby, fordetermination of the development values P_(i,K−(i−j)+1,f) neuralnetworks N_(ij) are able to be generated for each initial time intervali by means of the computer program iteratively (i−1) with j=1, . . .,(i−1), and whereby the neural network N_(i, ,j+1) depends recursivelyon the neural network N_(ij).
 41. Computer program product according toclaim 40, characterized in that for the events the initial time intervalis assigned to an initial year, and the development intervals areassigned to development years.
 42. Computer program product according toclaim 40, characterized in that for weighting a particular neuralnetwork N_(ij) by means of the computer program product the developmentvalues P_(p,q,f) with p=1, . . . ,(i−1) and q=1, . . . ,K−(i−j) arereadable from a database.
 43. Computer program product according toclaim 40, characterized in that with the computer program product theneural networks N_(ij) are trained identically for the same j, theneural network N_(i+1 J=i) being generated for an initial time intervali+1 by means of the computer program product, and all other neuralnetworks N_(i+1,j<i) of earlier initial intervals being taken over. 44.Computer program product according to claim 40, characterized in thatthe database additionally comprises in a stored way events P_(i,f) withinitial time interval i<1, all development values P_(i<1,k,f) beingknown for the events P_(i<1,f).
 45. Computer program product accordingto claim 40, characterized in that the computer program productcomprises at least one scaling factor by means of which the developmentvalues P_(ikf) of the different events P_(i,f) are scalable according totheir initial time interval.
 46. Computer program product which isloadable in the internal memory of a digital computer and comprisessoftware code segments with which the steps according to claim 30 areable to be carried out when the product is running on a computer, theneural networks being able to be generated through software and/orhardware.